36 research outputs found

    Learning Depth from Focus in the Wild

    Full text link
    For better photography, most recent commercial cameras including smartphones have either adopted large-aperture lens to collect more light or used a burst mode to take multiple images within short times. These interesting features lead us to examine depth from focus/defocus. In this work, we present a convolutional neural network-based depth estimation from single focal stacks. Our method differs from relevant state-of-the-art works with three unique features. First, our method allows depth maps to be inferred in an end-to-end manner even with image alignment. Second, we propose a sharp region detection module to reduce blur ambiguities in subtle focus changes and weakly texture-less regions. Third, we design an effective downsampling module to ease flows of focal information in feature extractions. In addition, for the generalization of the proposed network, we develop a simulator to realistically reproduce the features of commercial cameras, such as changes in field of view, focal length and principal points. By effectively incorporating these three unique features, our network achieves the top rank in the DDFF 12-Scene benchmark on most metrics. We also demonstrate the effectiveness of the proposed method on various quantitative evaluations and real-world images taken from various off-the-shelf cameras compared with state-of-the-art methods. Our source code is publicly available at https://github.com/wcy199705/DfFintheWild

    EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

    Full text link
    Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory (ET\mathbb{ET}), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as ET\mathbb{ET} space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our ET\mathbb{ET} space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding ET\mathbb{ET} space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed ET\mathbb{ET} space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .Comment: Accepted at ICCV 202

    EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images

    Full text link
    Light field cameras capture both the spatial and the angular properties of light rays in space. Due to its property, one can compute the depth from light fields in uncontrolled lighting environments, which is a big advantage over active sensing devices. Depth computed from light fields can be used for many applications including 3D modelling and refocusing. However, light field images from hand-held cameras have very narrow baselines with noise, making the depth estimation difficult. any approaches have been proposed to overcome these limitations for the light field depth estimation, but there is a clear trade-off between the accuracy and the speed in these methods. In this paper, we introduce a fast and accurate light field depth estimation method based on a fully-convolutional neural network. Our network is designed by considering the light field geometry and we also overcome the lack of training data by proposing light field specific data augmentation methods. We achieved the top rank in the HCI 4D Light Field Benchmark on most metrics, and we also demonstrate the effectiveness of the proposed method on real-world light-field images.Comment: Accepted to CVPR 2018, Total 10 page

    High-fidelity 3D Human Digitization from Single 2K Resolution Images

    Full text link
    High-quality 3D human body reconstruction requires high-fidelity and large-scale training data and appropriate network design that effectively exploits the high-resolution input images. To tackle these problems, we propose a simple yet effective 3D human digitization method called 2K2K, which constructs a large-scale 2K human dataset and infers 3D human models from 2K resolution images. The proposed method separately recovers the global shape of a human and its details. The low-resolution depth network predicts the global structure from a low-resolution image, and the part-wise image-to-normal network predicts the details of the 3D human body structure. The high-resolution depth network merges the global 3D shape and the detailed structures to infer the high-resolution front and back side depth maps. Finally, an off-the-shelf mesh generator reconstructs the full 3D human model, which are available at https://github.com/SangHunHan92/2K2K. In addition, we also provide 2,050 3D human models, including texture maps, 3D joints, and SMPL parameters for research purposes. In experiments, we demonstrate competitive performance over the recent works on various datasets.Comment: code page : https://github.com/SangHunHan92/2K2K, Accepted to CVPR 2023 (Highlight

    Facile and versatile ligand analysis method of colloidal quantum dot

    Get PDF
    Colloidal quantum-dots (QDs) are highly attractive materials for various optoelectronic applications owing to their easy maneuverability, high functionality, wide applicability, and low cost of mass-production. QDs usually consist of two components: the inorganic nano-crystalline particle and organic ligands that passivate the surface of the inorganic particle. The organic component is also critical for tuning electronic properties of QDs as well as solubilizing QDs in various solvents. However, despite extensive effort to understand the chemistry of ligands, it has been challenging to develop an efficient and reliable method for identifying and quantifying ligands on the QD surface. Herein, we developed a novel method of analyzing ligands in a mild yet accurate fashion. We found that oxidizing agents, as a heterogeneous catalyst in a different phase from QDs, can efficiently disrupt the interaction between the inorganic particle and organic ligands, and the subsequent simple phase fractionation step can isolate the ligand-containing phase from the oxidizer-containing phase and the insoluble precipitates. Our novel analysis procedure ensures to minimize the exposure of ligand molecules to oxidizing agents as well as to prepare homogeneous samples that can be readily analyzed by diverse analytical techniques, such as nuclear magnetic resonance spectroscopy and gas-chromatography mass-spectrometry. © 2021, The Author(s).1

    Disentangled Multi-Relational Graph Convolutional Network for Pedestrian Trajectory Prediction

    No full text
    Pedestrian trajectory prediction is one of the important tasks required for autonomous navigation and social robots in human environments. Previous studies focused on estimating social forces among individual pedestrians. However, they did not consider the social forces of groups on pedestrians, which results in over-collision avoidance problems. To address this problem, we present a Disentangled Multi-Relational Graph Convolutional Network (DMRGCN) for socially entangled pedestrian trajectory prediction. We first introduce a novel disentangled multi-scale aggregation to better represent social interactions, among pedestrians on a weighted graph. For the aggregation, we construct the multi-relational weighted graphs based on distances and relative displacements among pedestrians. In the prediction step, we propose a global temporal aggregation to alleviate accumulated errors for pedestrians changing their directions. Finally, we apply DropEdge into our DMRGCN to avoid the over-fitting issue on relatively small pedestrian trajectory datasets. Through the effective incorporation of the three parts within an end-to-end framework, DMRGCN achieves state-of-the-art performances on a variety of challenging trajectory prediction benchmarks

    A Set of Control Points Conditioned Pedestrian Trajectory Prediction

    No full text
    Predicting the trajectories of pedestrians in crowded conditions is an important task for applications like autonomous navigation systems. Previous studies have tackled this problem using two strategies. They (1) infer all future steps recursively, or (2) predict the potential destinations of pedestrians at once and interpolate the intermediate steps to arrive there. However, these strategies often suffer from the accumulated errors of the recursive inference, or restrictive assumptions about social relations in the intermediate path. In this paper, we present a graph convolutional network-based trajectory prediction. Firstly, we propose a control point prediction that divides the future path into three sections and infers the intermediate destinations of pedestrians to reduce the accumulated error. To do this, we construct multi-relational weighted graphs to account for their physical and complex social relations. We then introduce a trajectory refinement step based on a spatio-temporal and multi-relational graph. By considering the social interactions between neighbors, better prediction results are achievable. In experiments, the proposed network achieves state-of-the-art performance on various real-world trajectory prediction benchmarks

    Task-Specific Scene Structure Representations

    No full text
    Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes. To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named Scene Structure Guidance Network (SSGNet), to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight (56K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet
    corecore